Discovering Active Motifs in Sets of Related Protein Sequences and Using Them for Classi cation
نویسندگان
چکیده
We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) nd candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and functionally related proteins demonstrate the good performance of the presented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing ngerprint technique, we develop a protein classi er. When we apply the classi er to the 698 groups of related proteins in the PROSITE catalog, it gives information that is complementary to the BLOCKS protein classi er of Heniko and Heniko . Thus, using our classi er in conjunction with theirs, one can obtain high con dence classi cations (if BLOCKS and our classi er agree) or suggest a new hypothesis (if the two disagree). Department of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ 07102. Cold Spring Harbor Laboratory, 100 Bungtown Road, Cold Spring Harbor, NY 11724. Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012. Image Processing Section, Laboratory of Mathematical Biology, Division of Cancer Biology and Diagnosis, National Cancer Institute, National Institutes of Health, Frederick, MD 21701. Department of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ 07102.
منابع مشابه
A Hybrid Evolutionary Approach for the Protein Classification Problem
This paper proposes a hybrid algorithm that combines characteristics of both Genetic Programming (GP) and Genetic Algorithms (GAs), for discovering motifs in proteins and predicting their functional classes, based on the discovered motifs. In this algorithm, individuals are represented as IF-THEN classi cation rules. The rule antecedent consists of a combination of motifs automatically extracte...
متن کاملDesigning Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method
Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...
متن کاملThe roles of EPIYA sequence to perturb the cellular signaling pathways and cancer risk
Abstract It was shown that several pathogenic bacterial effector proteins contain the Glu-Pro-Ile-Tyr-Ala (EPIYA) or a similar sequence. These bacterial EPIYA effectors are delivered into host cell via type III or IV secretion system, where they undergo tyrosine phosphorylation at the EPIYA sequences, which triggers interaction with multiple host cell SH2 domain-containing proteins and thereby...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملClassifying Technical Terms
Automating the process of term recognition and classi cation is important for digital libraries. Automatic Term Recognition (ATR) has many applications in areas related to digital libraries, e.g. information retrieval and extraction from the web, summarisation, machine translation, dictionary construction etc. Automatic term classi cation attracts the interest of researchers following the steps...
متن کامل